{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COMPSCI 389: Introduction to Machine Learning\n", "# Topic 0.1: Python and Jupyter Notebooks\n", "\n", "In this course we will use Python, and specifically Jupyter Notebooks like this one. This notebook provides a brief introduction to Python and Jupyter notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python\n", "\n", "Python is a high-level programming language.\n", "- It is *interpreted*, meaning that it is not compiled into executables, but rather executed directly from the source code by a program called an \"interpreter\".\n", "- It is a popular language for machine learning.\n", "- It is *very* slow in comparison to compiled languages.\n", " - Many python libraries call C++ code, making them efficient.\n", " - Efficient use of python leverages these library calls for anything compute-intesive.\n", " - Even writing careful python, I've found that students (BS-PhD) produce Python code that is around 6x to 100x times slower than corresponding C++ code.\n", "- Python code is typically stored in `.py` files.\n", "- Common integrated development environments (IDEs, programs for writing and running python files) include Visual Studio Code (VSCode) and PyCharm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jupyter Notebooks\n", "\n", "Jupyter notebooks were previously called iPython notebooks, when they were restricted to Python. They have since been extended to work with many different programming languages (within a single file!) and were thus renamed to Jupyter notebooks. However, the file type retains the old name: `.ipynb` for \"IPYthon NoteBook\".\n", "\n", "This document is a Jupyter notebook. It consists of a vertical stack of \"cells\". Each cell has a type, including \"markdown\" and \"Python\".\n", "\n", "You can edit a cell by double clicking on it. If you double click on this cell, you should see the raw markdown (a language for displaying text). You should see the type of the cell in the bottom right - in this case it should say \"markdown\" in the bottom right of this cell." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you click on the bottom right of the cell where it lists the cell type, you can change the cell to a different type. We will mainly use:\n", "- Markdown cells for displaying text.\n", "- Python cells for displaying *and running* code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you have finished editing a markdown cell, you can click the check mark in the top right of the cell to stop editing it. This renders the cell, and we say the cell is \"run\". You can also hit `ctrl+enter` to run any cell or `shift+enter` to run any cell and move to the next cell. If you hit `shift+enter` on the last cell, it will automatically create a new cell after the current one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Markdown Cells\n", "\n", "(Edit this cell to see the underlying formatting.)\n", "\n", "In markdown cells you can have **bold** or *italic* text. You can have inline code like `this` or code blocks like this:\n", "```\n", "print(\"Hello World!\")\n", "```\n", "You can have block quotes like this:\n", "> I am not a crook.\n", "\n", "You can have [links](https://google.com).\n", "\n", "You can have comments like this: \n", "\n", "You can have images like this: (commented out!)\n", "\n", "\n", "If you want to control the width so it's not too big, you can have images like this (which sets the width to 400 pixels):\n", "\"Wikipedia\n", "\n", "You can make horizontal lines like this:\n", "\n", "---\n", "\n", "You can change the font color like this.\n", "\n", "You can have headers\n", "# Header 1\n", "## Header 2\n", "### Header 3\n", "#### Header 4\n", "\n", "You can make lists like this:\n", "- Dog\n", "- Cat\n", " - Tabby\n", " - Calico\n", "- Mouse\n", "Or like this:\n", "1. Apple\n", "2. Orange\n", "3. Pear\n", "\n", "You can include math like this $\\pi \\neq \\int_{-\\infty}^\\infty x^2 \\, \\text{d}x$. The language used to display math is called LaTeX. Most computer science papers are written using LaTeX. You can find a free (commonly used) LaTeX editor at https://www.overleaf.com/. VSCode only works with a restricted set of features from LaTeX, but enough to write basic equations.\n", "\n", "Here's another example that creates an \"align\" block in LaTeX, which aligns the & character on each line of the equation.\n", "$$\n", "\\begin{align}\n", "a =& b + c \\\\\n", "x =& y - z.\n", "\\end{align}\n", "$$\n", "\n", "[And much much more](https://www.markdownguide.org/basic-syntax/)!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python Cells\n", "\n", "Python cells contain Python code, but are not automatically run. Python cells can be run by clicking the triangle \"run\" button in the top left of the cell or by using the `ctrl+enter` (run cell) or `shift+enter` (run cell and move to the next cell) commands.\n", "\n", "The first time that you run a cell, VSCode may prompt you for two things.\n", "\n", "1. \"Do you trust the authors of the files in this workspace?\" You are running the program in the notebook, so ensure you trust the source of the notebook.\n", "2. What \"Kernel\" would you like to use? This is asking what installation of python to use. Depending on your operating system, the current selection should be visible somewhere near the top-right or bottom-right of VSCode. It should say, for example, \"Python 3.1.1.7\". If you click this text, you can select different versions of python (or different virtual environments) to use.\n", "\n", "Once you have trusted the workspace and selected a python kernel, the python cell should run, showing the output below the code cell." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World\n", "We can print many things! 42\n" ] } ], "source": [ "print(\"Hello World\")\n", "print(\"We can \", \"print many things! \", 42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python Basics\n", "\n", "In Python:\n", "- Object types do not need to be specified.\n", "- Whitespace (tabs) are used to denote when if-statements, loops, function definitions, etc. end.\n", "- Packages are installed using commands like this, run in the command line:\n", "\n", "> pip install numpy\n", "\n", "Note that this will install numpy into your default Python installation. If you're using a different Python kernel in VSCode, you need to ensure that you install numpy (or the desired library) for that kernel!\n", "\n", "Here is an example:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is less than b\n", "We got here 1.\n" ] } ], "source": [ "a = 10 # Comments come after '#' symbols. Notice we didn't say the type of a\n", "b = 20 # Semicolons are optional after lines, and usually not included!\n", "\n", "if a < b: \n", " print(\"a is less than b\")\n", " print(\"We got here 1.\") # This is within the a\n", "```\n", "\n", "In Java you write:\n", "```\n", "import java.util.Scanner\n", "```\n", "\n", "In Python you write:\n", "```\n", "import math\n", "```\n", "\n", "Or, if you want one specific function:\n", "```\n", "from math import sqrt\n", "```\n", "\n", "Or, if you want to use the math library but don't want to type out \"math\" every time, you can give a shorter name:\n", "```\n", "import math as mth\n", "```" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "import math\n", "print(math.sqrt(16))" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "from math import sqrt\n", "\n", "print(sqrt(16))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "import math as mth\n", "\n", "print(mth.sqrt(16)); # This is a silly exmaple, but for long library names this can save a lot of space." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "# Like any other code, import statements from prior cells persist!\n", "print(sqrt(16)) # This uses from math import sqrt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy\n", "\n", "Numpy is a common library used for numerical computing. It is mainly used for it's `ndarray` object, which represents multi-dimensional arrays.\n", "\n", "Here's an example:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original array: [1 2 3 4 5]\n", "Array plus 10: [11 12 13 14 15]\n", "Array squared: [ 1 4 9 16 25]\n", "Mean of the array: 3.0\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Creating a NumPy array\n", "arr = np.array([1, 2, 3, 4, 5])\n", "\n", "# Basic operations\n", "arr_plus_10 = arr + 10 # Add 10 to each element\n", "arr_squared = arr ** 2 # Square each element\n", "\n", "# Displaying the results\n", "print(\"Original array:\", arr)\n", "print(\"Array plus 10:\", arr_plus_10)\n", "print(\"Array squared:\", arr_squared)\n", "\n", "# Applying a mathematical function\n", "mean_value = np.mean(arr)\n", "print(\"Mean of the array:\", mean_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing Arrays\n", "\n", "Python allows you to specify sub-arrays in a convenient manner. In the example below we create a 2-dimensional array (a matrix). We can access this with `arr_2d[i,j]` to get the element in the i'th row and j'th column. We can get sub-arrays by specifying ranges of values for i and h. \n", "\n", "Note that `:` means \"all incices\".\n", "\n", "Here are some examples:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original Array:\n", " [[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n", "Selected Columns:\n", " [[2 3]\n", " [5 6]\n", " [8 9]]\n" ] } ], "source": [ "# Creating a 2D NumPy array\n", "arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", "\n", "selected_columns = arr_2d[:, 1:3] # Get all rows, and columns 1 to 3\n", "\n", "print(\"Original Array:\\n\", arr_2d)\n", "print(\"Selected Columns:\\n\", selected_columns)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whoa! Notice that columns 1:3 resulted in only two columns! What's going on?\n", "\n", "The notation `i:j` means to take columns `i` through `j-1`. This convention makes it easier to reference elements when you know the length of an array. Using `0:n` for a length `n` array will give all elements, `0` to `n-1`. In the example above `1:3` includes the middle and last columns (indices 1 and 2), but not the first (index 0).\n", "\n", "We can also index backwards from the end of an array:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The last element is 5.\n", "Original Array: [1 2 3 4 5]\n", "All but last element: [1 2 3 4]\n" ] } ], "source": [ "# Create a 1D NumPy array\n", "arr_1d = np.array([1, 2, 3, 4, 5])\n", "\n", "# Print the last element\n", "print(f\"The last element is {arr_1d[-1]}.\")\n", "\n", "# Using [:-1] indexing to select all elements except the last one\n", "all_but_last = arr_1d[:-1]\n", "\n", "print(\"Original Array:\", arr_1d)\n", "print(\"All but last element:\", all_but_last)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use an array of Booleans to index into an array:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "indices = [False False False False True True True]\n", "above_threshold = [20 25 30]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Creating a NumPy array\n", "arr = np.array([1, 5, 10, 15, 20, 25, 30])\n", "\n", "# Define the threshold\n", "threshold = 15\n", "\n", "# Get indices above threshold\n", "indices = arr > threshold # This is an array of Booleans\n", "print(\"indices = \", indices)\n", "\n", "# Get the corresponding values\n", "above_threshold = arr[arr > threshold]\n", "print(\"above_threshold = \", above_threshold)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types\n", "\n", "We can get the type of an object using the `type` function." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "int" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Print the type of a - an integer from the start.\n", "print(type(a))\n", "\n", "# Notice that here display is nicer:\n", "display(type(a))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "display(type(indices)) # This tells us its a numpy ndarray, but not what is in the array." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('bool')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(indices.dtype) # This tells us what is inside the numpy array." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }